In this chapter, we discuss spatial data visualization techniques. Spatial data contains measurements of various geographical locations, areas, path, and etc. Therefore, its visualizations are often based on maps. In this chapter, we will go over a few typical map visualizations. To generate these visualizations, we need the following R packages.
library(tidyverse)
library(ggmap) # devtools::install_github("dkahle/ggmap")
library(RColorBrewer)
library(statebins)
library(viridis)
library(viridisLite)
library(geofacet) # takes long time to install.
library(geosphere)
library(grid)
library(gridExtra)
library(mapproj)
#For windows OS, you may need to install Rtools from: https://cran.r-project.org/bin/windows/Rtools/rtools40.html
A choropleth map is a type of thematic map in which areas are shaded or patterned in proportion to a statistical variable that represents an aggregate summary of a geographic characteristic within each area, such as population density or per capita income.
Choropleth maps provide an easy way to visualize how a measurement varies across a geographic area or show the level of variability within a region. A heat map is similar but does not use the geographic areas. They are the most common type of thematic map because published statistical data (from government or other sources) is generally aggregated into well-known geographic units, such as countries, states, provinces, and counties.
We have the data of the GDP per capita of each state in US in 2019. We can simply draw a barplot to visualize the data. Even though we reorder the bar according to the bar length, it is still hard to see where are the economically strong states and where are the weak ones. If we could combine the barplot with the map, the visualization will be much more informative. To generate the choropleth map, we can ggplot and maps packages together.
library(tidyverse)
library(gridExtra)
state_gdp = read.csv("data/state_gdp_per_capita_2019.csv", header = TRUE)
state_gdp %>%
filter(State != "District of Columbia") %>%
ggplot() +
geom_col(aes(x = reorder(State,-gdp_per_capita2019),
y = gdp_per_capita2019,
fill = gdp_per_capita2019),
width = 0.8)+
scale_fill_viridis_c(limits = c(40000, 100000),
breaks = seq(40000, 100000, 10000),
labels = format(seq(40000, 100000, 10000), scientific=F))+
ylab("GDP per capita")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.3, hjust = 1),
axis.title.x = element_blank())
# create a map
state_boundary = map_data("state") # from ggplot2
#head(state_boundary)
ggplot() +
geom_polygon(data=state_boundary, aes(x=long, y=lat, group=group), # try removing "group=group"
color="black", fill="lightblue" ) +
coord_map("conic", lat0 = 30)
If we can combine the two plot above, the visualization will be much more informative. If we fill in each region with a color corresponding to the value, we have the choropleth map.
state_gdp$state = tolower(state_gdp$State)
state_gdp_boud <- left_join(state_boundary, state_gdp,
by = c("region"="state"))
ggplot() +
geom_polygon( data = state_gdp_boud,
aes(x = long, y = lat, group = group,
fill = gdp_per_capita2019),
color="white", size = 0.2) +
scale_fill_viridis_c(limits = c(40000, 100000),
breaks = seq(40000, 100000, 10000),
labels = format(seq(40000, 100000, 10000),
scientific=F))+
coord_map("conic", lat0 = 30)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We can generate choropleth map for the counties or the states.
county_boundary = map_data("county")
#head(county_boundary)
ggplot() +
geom_polygon(data=county_boundary,
aes(x=long, y=lat, group=group), # try removing "group=group"
color="black", fill="lightblue") +
coord_map("conic", lat0 = 30)
Lastly, we can generate world map too.
world_boundary = map_data("world")
ggplot() +
geom_polygon(data=world_boundary,
aes(x=long, y=lat, group=group), # try removing "group=group"
color="black", fill="lightblue" ) +
coord_fixed(ratio = 1.3)
We now analyze the college data set. Suppose we are interested in the distribution of the colleges across US, we can use choropleth map to display the number of colleges in each state.
college = read.csv("data/college.csv", header = TRUE)
state_info = data.frame(abbrev = state.abb,
fullname = state.name,
fullname_lower = tolower(state.name),
area = state.x77[,"Area"])
college_info = left_join(college, state_info, by = c("state" = "abbrev"))
college_info = college_info %>%
group_by(fullname_lower) %>%
summarize(college_number = length(state),
area = max(area)) %>%
mutate(college_number_per_1000sqmile = college_number/area*1000)
state_boundary = map_data("state") # create a data frame containing the state boundaries. this function is from ggplot2
state_boundary = left_join(state_boundary, college_info, by = c("region" = "fullname_lower"))
ggplot() +
geom_polygon(data=state_boundary,
aes(x=long, y=lat, group=group, fill=college_number),
color="black") +
scale_fill_viridis_c() +
coord_map("conic", lat0 = 30)
ggplot() +
geom_polygon(data=state_boundary,
aes(x=long, y=lat, group=group, fill=college_number_per_1000sqmile),
color="black") +
scale_fill_viridis_c(name = "# of colleges \nper 1000 sq. mile", trans="log10") +
coord_map("conic", lat0 = 30)
We can use the map as a canvas and plot locations of the places of interest, for example, a scatterplot on map. The techniques we learn for the scatterplot can be directly transferred to map visualization.
For example, we can generate a bubble plot on a map. Suppose we visualize the data set on colleges in the US. We can plot the location of the each college, the tuition, and the number of students on the map.
college = read.csv("data/college.csv",header = TRUE)
ggplot() +
geom_polygon(data = state_boundary,
aes(x = long, y = lat, group = group),
color = "black", fill = "lightblue" ) +
geom_point(data = college %>%
filter(!(state %in% c("AK","HI"))) %>%
arrange(tuition),
mapping = aes(x = lon, y = lat,
color = tuition, size = undergrads)) +
scale_color_viridis_c() +
scale_size(range = c(0.1,5)) +
coord_map("conic", lat0 = 30)
We can further conduct analysis on the map by generating density estimate and so on.
ggplot() +
geom_polygon(data = state_boundary,
aes(x = long, y = lat, group = group),
color = "black", fill = "lightblue") +
geom_point(data = college %>% filter(!(state %in% c("HI","AK"))),
mapping = aes(x = lon, y = lat), size = 1) +
stat_density2d(data = college %>% filter(!(state %in% c("HI","AK"))),
mapping = aes(x = lon, y = lat,
fill = after_stat(level)),
alpha = 0.3, size = 2, bins = 10, geom = "polygon") +
coord_map("conic", lat0 = 30)
The visualization so far is based on the plotted map which contains
only the boundaries of the states or counties. When plotting more local
information such as crime location, we need the reference of the road,
building, and etc. Therefore, it will be more convenient if we can
access the map information from one of the map website, such as Google
Map. Fortunately, ggmap is an R package that makes it easy
to retrieve raster map tiles from popular online mapping services like
Google Maps and Stamen Maps and plot them using the ggplot2
framework.
library(ggmap)
The instruction for setting up the API is at here. Another tutorial for Google platform is at here
For this course, you can use this API key, but please be careful when using the key.
ggmap::register_google(key = "AIzaSyB2dXu4azkdzKjFg8Tskk5NFPurFc2__r0")
Geocoding: We can use the geocode function to get the latitude, longitude and other information:
# geocode('Univerisity of Cincinnati',output = 'more')
## # A tibble: 1 x 9
## lon lat type loctype address north south east west
## <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 -84.5 39.1 establis~ rooftop 2600 clifton ave, cinci~ 39.1 39.1 -84.5 -84.5
First, we need to us get_map() function to download the
map from various sources, e.g., Google map. The get_map()
function is a wrapper function for the underlying functions
get_googlemap(), get_openstreetmap(),
get_stamenmap(), and get_cloudmademap() which
accepts a wide array of arguments and returns a classed raster object
for plotting. Once the map is downloaded, we use ggmap() to
create the canvas and generate visualization.
UClayer_goo_ter <- get_map(location = "University of Cincinnati", zoom = 15,
source = "google", maptype = "terrain")
UClayer_goo_sat <- get_map(location = "University of Cincinnati", zoom = 15,
source = "google", maptype = "hybrid")
UClayer_goo_roa <- get_map(location = "University of Cincinnati", zoom = 15,
source = "google", maptype = "roadmap")
UClayer_sta_ter <- get_map(location = "University of Cincinnati", zoom = 15,
source = "stamen", maptype = "terrain")
UClayer_sta_wat <- get_map(location = "University of Cincinnati", zoom = 15,
source = "stamen", maptype = "watercolor")
UClayer_sta_ton <- get_map(location = "University of Cincinnati", zoom = 15,
source = "stamen", maptype = "toner")
Houstonlayer <- get_map(location = "Houston, TX", zoom = 11,
source = "google", maptype = "roadmap")
# We could directly use get_googlemap() instead of get_map()
Cincinnatilayer <- get_googlemap(center = c(-84.50, 39.137580),
zoom = 12, scale = 2, maptype ='terrain', color = 'color')
USA_lon_lat <- c(left = -128, bottom = 23, right = -65, top =52)
USAlayer <- get_stamenmap(USA_lon_lat, zoom = 5)
library(grid)
library(gridExtra)
g1=ggmap(UClayer_goo_ter,extent = 'device') + ggtitle("Google Maps Terrain")
g2=ggmap(UClayer_goo_sat,extent = 'device') + ggtitle("Google Maps Satellite")
g3=ggmap(UClayer_goo_roa,extent = 'device') + ggtitle("Google Maps Road")
g4=ggmap(UClayer_sta_ter,extent = 'device') + ggtitle("Stamen Maps Terrain")
g5=ggmap(UClayer_sta_wat,extent = 'device') + ggtitle("Stamen Maps Watercolor")
g6=ggmap(UClayer_sta_ton,extent = 'device') + ggtitle("Stamen Maps Toner")
grid.arrange(g1,g2,g3,g4,g5,g6,nrow = 2)
ggmapUsing the Cincinnati police data which is availabe at the Cincinnati Open Data website, we can download the location of different type of crimes in Cincinnati. We use the visualization below to display the location of types of crimes.
UC_crime = read.csv("data/UC_crime.csv", header = TRUE)
ggmap(UClayer_goo_ter) +
geom_point(data = filter(UC_crime, OFFENSE %in% c("VANDALISM","BURGLARY","ROBBERY", "ASSAULT")) ,
mapping = aes(x=LONGITUDE_X, y=LATITUDE_X), size=0.5, alpha=0.5)+
scale_color_viridis_d(guide = FALSE)+
facet_wrap(~OFFENSE,ncol = 2)
## Warning: Removed 1329 rows containing missing values (`geom_point()`).
## Warning: The `guide` argument in `scale_*()` cannot be `FALSE`. This was deprecated in
## ggplot2 3.3.4.
## ℹ Please use "none" instead.
## ℹ The deprecated feature was likely used in the ggmap package.
## Please report the issue at <https://github.com/dkahle/ggmap/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
As we can see assault takes place mostly to the south of the campus with more incidents concentrated around the southwest corner of the campus. The burglary mostly happens to the residential houses to the south of the campus. Robbery often happens to the south and to the east of the campus, where there are more shops and restaurants. This visualization clearly tells us the safe and risk areas around the campus. We can further conduct some analysis using density estimation as follows.
ggmap(UClayer_goo_ter) +
geom_point(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) ,
mapping = aes(x=LONGITUDE_X, y=LATITUDE_X), size=0.5, alpha=0.5)+
geom_density2d_filled(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) ,
mapping = aes(x=LONGITUDE_X, y=LATITUDE_X, fill=..level..),
size = 0.1, alpha=0.4)+
geom_density_2d(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) ,
mapping = aes(x=LONGITUDE_X, y=LATITUDE_X),
size = 0.25, colour = "black")+
scale_fill_viridis_d()+
facet_wrap(~OFFENSE,ncol = 2)
## Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(level)` instead.
## ℹ The deprecated feature was likely used in the ggmap package.
## Please report the issue at <https://github.com/dkahle/ggmap/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 772 rows containing non-finite values
## (`stat_density2d_filled()`).
## Warning: Removed 772 rows containing non-finite values (`stat_density2d()`).
## Warning: Removed 772 rows containing missing values (`geom_point()`).
As we can, the areas with bright color indicate the higher frequencies of the crimes, which are consistent with what we find in the previous visualization.
ggmap(UClayer_goo_ter) +
geom_point(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) ,
mapping = aes(x=LONGITUDE_X, y=LATITUDE_X), size=0.5, alpha=0.5)+
stat_density_2d(data = filter(UC_crime, OFFENSE %in% c("ASSAULT", "ROBBERY")) ,
mapping = aes(x=LONGITUDE_X, y=LATITUDE_X, fill=..level..),
size = 0.1, bins = 6, alpha=0.2,
geom = "polygon")+
facet_wrap(~OFFENSE,ncol = 2)
## Warning: Removed 772 rows containing non-finite values (`stat_density2d()`).
## Warning: Removed 772 rows containing missing values (`geom_point()`).
We can further generate route information as follows.
UC_dist = mapdist('2715 Digby Ave, Cincinnati, OH 45220',
'2733 Short Vine St, Cincinnati, OH 45219')
UC_trek <- trek("2715 Digby Ave, Cincinnati, OH 45220",
"2733 Short Vine St, Cincinnati, OH 45219",
structure = "route")
UC_dist
## # A tibble: 1 × 9
## from to m km miles seconds minutes hours mode
## <chr> <chr> <int> <dbl> <dbl> <int> <dbl> <dbl> <chr>
## 1 2715 Digby Ave, Cincinna… 2733… 1902 1.90 1.18 339 5.65 0.0942 driv…
UC_trek
## # A tibble: 37 × 3
## lat lon route
## <dbl> <dbl> <chr>
## 1 39.1 -84.5 A
## 2 39.1 -84.5 A
## 3 39.1 -84.5 A
## 4 39.1 -84.5 A
## 5 39.1 -84.5 A
## 6 39.1 -84.5 A
## 7 39.1 -84.5 A
## 8 39.1 -84.5 A
## 9 39.1 -84.5 A
## 10 39.1 -84.5 A
## # ℹ 27 more rows
g1 +
geom_path(
aes(x = lon, y = lat), colour = "blue",
size = 1.5, alpha = .5,
data = UC_trek, lineend = "round"
)
Next, we look at another data set about the Cincinnati housing price. We further download the map of Cincinnati.
Cincinnatilayer <- get_googlemap(center = c(-84.50, 39.137580),
zoom = 12, scale = 2, maptype ='terrain', color = 'color')
We analyze the Cincinnati housing price.
cincinnati_SFH = read.csv("data/cincinnati_SFH_sales_2016to2019.csv",header = TRUE)
ggmap(Cincinnatilayer)+
geom_point(data=cincinnati_SFH,aes(x=Longitude, y=Latitude, color = SalePrice),size=0.5)+
scale_color_viridis_c(limits = c(0, 500000),
breaks = seq(0, 500000, 100000),
labels = format(seq(0, 500000, 100000), scientific=F))
## Warning: Removed 11144 rows containing missing values (`geom_point()`).
ggmap(Cincinnatilayer)+
stat_summary_2d(data=cincinnati_SFH,
aes(x=Longitude, y=Latitude, z = SalePrice), fun = median, alpha=0.5, bins = 40)+
scale_fill_viridis_c(limits = c(0, 500000),
breaks = seq(0, 500000, 100000),
labels = format(seq(0, 500000, 100000), scientific=F))
## Warning: Removed 11144 rows containing non-finite values (`stat_summary2d()`).
## Warning: Removed 49 rows containing missing values (`geom_tile()`).